What Is Blog Content Scraping in WordPress?

Blog content scraping is when content is taken from numerous sources and republished on another site. Usually, this is done automatically via your blog’s RSS feed. Unfortunately, it is very easy and very common to have your WordPress blog content stolen in this way. If it has happened to you, then you understand how stressful and frustrating it can be. Sometimes, your content will be simply copied and pasted directly to another website, including your formatting, images, videos, and more. Other times, your content will be reposted with attribution and a link back to your website, but without your permission. Although this can help your SEO, you may want to keep your original content hosted on your site only.

Why Do Content Scrapers Steal Content?

Some of our users have asked us why scrapers are stealing content. Usually, the main motivation for content theft is to profit from your hard work: Affiliate commission: Dishonest affiliate marketers may use your content to bring traffic to their site through search engines in order to promote their niche products. Lead Generation: Lawyers and realtors may pay someone to add content and gain authority in their community and not realize it is being scraped from other sources. Advertising Revenue: Blog owners may scrape content to create a hub of knowledge in a certain niche ‘for the good of the community’ and then plaster the site with ads.

Is It Possible to Completely Prevent Content Scraping?

In this article, we will show you some steps you can take to reduce and prevent content scraping. But unfortunately, there is no way to completely stop a determined thief. That’s why we finished this article with a section on how you can take advantage of content scrapers. While you can’t always stop a thief, you may be able to gain some traffic and revenue through the content they have stolen from you.

What Should You Do When You Discover Someone Has Scraped Your Content?

Since it’s not possible to completely stop scrapers, you may one day discover that someone is using content they stole from your blog. You may wonder what to do when that happens. Here are a few approaches that people take when dealing with content scrapers: Do Nothing: You can spend a lot of time fighting scrapers, so some popular bloggers decide to do nothing. Google already sees well-known sites as authorities, but that’s not true of smaller sites. So this approach is not always the best, in our opinion. Take Down: You can contact the scraper and ask them to take the content down. If they refuse, then you submit a takedown notice. You can learn how in our guide on how to easily find and remove stolen content in WordPress. Take Advantage: While we actively work at having content scraped from WPBeginner taken down, we also use a few techniques to get traffic and make money from scrapers. You can learn how in the ‘Take Advantage of Content Scrapers’ section below.

WPBeginner Sites»WPBeginner»Blogg»Beginners Guide»Beginner’s Guide to Preventing Blog Content Scraping in WordPress

Beginner’s Guide to Preventing Blog Content Scraping in WordPress

By Editorial Staff | augusti 30, 2024 | Reader Disclosure

Share Tweet Share WhatsApp Email

Tänk dig att arbeta hårt för att skriva en bra berättelse eller artikel, bara för att hitta någon annan som hävdar att den är deras egen. Det är vad som händer när folk stjäl innehållet på din webbplats.

Stöld av innehåll, eller “scraping”, är ett stort problem för webbplatsägare. De här personerna är tjuvar som kopierar ditt arbete, använder det på sina egna webbplatser och ibland till och med låtsas att det är deras eget. Det här kan vara väldigt frustrerande och orättvist.

I den här artikeln kommer vi att täcka vad skrapning av innehåll på bloggar är, hur du kan minska och förhindra skrapning av innehåll och till och med hur du kan dra nytta av innehållsskrapare för din egen fördel.

Beginner's Guide to Preventing Blog Content Scraping in WordPress

Vad är skrapning av innehåll i bloggar i WordPress?

Skrapning av blogginnehåll är när content tas från flera källor och återpubliceras på en annan site. Vanligtvis görs detta automatiskt via din bloggs RSS-flöde.

Tyvärr är det mycket enkelt och mycket vanligt att få din WordPress blogg content stulet på detta sätt. Om du har råkat ut för det förstår du hur stressande och frustrerande det kan vara.

Ibland kommer ditt content helt enkelt att kopieras och klistras in direkt på en annan website, inklusive dina format, images, videoklipp med mera.

Andra gånger kommer ditt content att läggas om med hänvisning och en länk tillbaka till din website, men utan din behörighet. Även om detta kan hjälpa din sökmotorsoptimering kanske du vill hålla ditt ursprungliga content hosted endast på din site.

Varför stjäl content scrapers content?

Några av våra användare har frågat oss varför scrapers stjäl content. Vanligtvis är den främsta motivationen för att stjäla content att tjäna pengar på yours hårda arbete:

Provision från affiliates: Oärliga affiliate-marknadsförare kan använda your content för att få trafik till sin site via search engines i order att främja sina nischprodukter.
Generering av leads: Advokater och fastighetsmäklare kan betala någon för att add to content och få auktorisering i sin community och ej inse att det skrapas från andra källor.
Intäkter från Advertising: Bloggägare kan skrapa content för att skapa ett nav av kunskap inom en viss nisch “för samhällets bästa” och sedan plåstra om webbplatsen med annonser.

Är det möjligt att helt och hållet förhindra skrapning av content?

I den här artikeln visar vi dig några steg du kan ta för att minska och förhindra skrapning av content. Men tyvärr finns det inget sätt att helt stoppa en målmedveten tjuv.

Det är därför vi slutförde den här artikeln med ett avsnitt om hur du kan dra nytta av content scrapers. Även om du inte alltid kan stoppa en tjuv, kanske du kan få lite trafik och intäkter genom det content som de har stulit från dig.

Vad ska du göra när du upptäcker att någon har skrapat ditt content?

Eftersom det ej är möjligt att helt stoppa skrapare kan du en dag upptäcka att någon använder content som de stulit från din blogg. Du kanske undrar vad du ska göra när det händer.

Här är några tillvägagångssätt som människor använder när de hanterar content scrapers:

Gör ingenting: You can spend a lot of time fighting scrapers, så vissa populära bloggare bestämmer sig för att inte göra någonting. Google ser redan välkända webbplatser som auktoriseringar, men det gäller ej för mindre webbplatser. Så det här tillvägagångssättet är inte alltid det bästa, enligt vår mening.
Ta bort: You can contact the scraper and ask them to take the content down. Om de vägrar kan du submitte a takedown notice. Du kan läsa mer om hur du gör i vår guide om hur du enkelt hittar och tar bort stulet content i WordPress.
Dra fördel av det: Medan vi aktivt arbetar med att ta bort content som skrapats från WPBeginner, använder vi också några tekniker för att få trafik och tjäna pengar på skrapare. You can learn how in the “Take Advantage of Content Scrapers” section below.

Med detta sagt, låt oss ta en titt på hur du kan förhindra blogg scraping i WordPress. Eftersom detta är en omfattande guide har vi inkluderat en innehållsförteckning för enklare navigation:

Copyright or Trademark Your Blog’s Name and Logo
Make Your RSS Feed More Difficult to Scrape
Disable Trackbacks and Pingbacks
Block the Scraper’s Access to Your WordPress Website
Prevent Image Theft in WordPress
Discourage Manual Copying of Your Content
Take Advantage of Content Scrapers

1. Copyright eller varumärke för din bloggs namn och logga

Varumärkes- och upphovsrättslagar skyddar dina immateriella rättigheter, ditt varumärke och ditt företag mot många juridiska utmaningar. Detta inkluderar plagiat och olaglig användning av ditt upphovsrättsskyddade material eller ditt varumärkes namn och logga.

You should clearly display a copyright notice on your site. Även om innehållet på din website automatiskt täcks av lagar om copyright, kommer du genom att displayed ett notice att informera om att ditt innehåll är upphovsrättsskyddat och att de inte kan använda dina skyddade egenskaper i affärsverksamhet.

Display a Copyright Notice on Your Website

Du kan till exempel add to en copyright notice med ett dynamiskt datum till din WordPress footer. På så sätt hålls din copyright notice uppdaterad.

Detta kan avskräcka vissa användare från att stjäla det. Det kommer också att hjälpa till om du behöver skicka ett brev om upphörande och avstående eller lämna in ett DCMA-klagomål för att ta bort ditt stulna content.

Du kan också ansöka om registrering av copyright online. Den här processen kan vara komplicerad, men som tur är finns det billiga juridiska tjänster som kan hjälpa småföretag och enskilda.

Läs mer i vår guide om hur du varumärkesskyddar och upphovsrättsskyddar namnet och loggan på din blogg.

2. Gör ditt RSS-flöde svårare att skrapa

Eftersom skrapning av blogginnehåll vanligtvis görs automatiskt via din bloggs RSS-flöde, låt oss titta på några användbara ändringar du kan göra i ditt feed.

Inkludera inte hela innehållet i posten i ditt RSS-flöde i WordPress

Du kan bara inkludera en summary av varje post i ditt RSS-flöde istället för hela innehållet. Detta inkluderar ett excerpt samt metadata för post som datum, författare och kategori.

Det finns verkligen debatt i bloggvärlden om om man ska ha fullständiga RSS-flöden eller summary feeds. Vi kommer inte att komma in på det nu förutom att säga att en av Pro av att bara ha en summary är att det hjälper till att förhindra content skrapning.

Du kan ändra inställningarna genom att gå till Settings ” Läsa i din WordPress adminpanel. Du måste välja alternativet “Excerpt” och sedan clicka på knappen “Save Changes”.

RSS Feeds Can Contain Full Text or an Excerpt of Each Post

Nu kommer RSS-flödet bara att visa ett excerpt av din artikel. Om någon stjäl ditt content genom ditt RSS-flöde, så får de bara summary, eller ej hela posten.

Om du gillar att tweaka sammanfattningen kan du se vår guide om hur du customize WordPress excerpt.

Optimera ditt RSS-flöde för att förhindra skrapning

Det finns andra sätt du kan optimera ditt WordPress RSS-flöde för att skydda ditt content, få fler bakåtlänkar, öka din webbtrafik och mycket mer. Ett av de bästa sätten är att fördröja posts från att visas i RSS-flödet.

Fördelen är att när du försenar post från att visas i ditt RSS-flöde, ger du search engines tid att crawla och indexera ditt content innan det dyker upp någon annanstans, till exempel på scraper’s websites. Search engines kommer då att se din site som auktoriseringen.

Det säkraste och enklaste sättet att göra detta är att använda WPCode eftersom det har ett recept som automatiskt lägger till korrekt custom code till WordPress.

För detaljerade instruktioner, se vår guide om hur du fördröjer posts från att visas i ditt WordPress RSS-flöde.

3. Inaktivera Trackbacks, Pingbacks och REST API

I början av bloggandet introducerades trackbacks och pingbacks som ett sätt för bloggar att meddela varandra om links. När någon länkar till ett blogginlägg på din blogg skickar deras website automatiskt en ping till din.

Denna pingback kommer sedan att visas i din bloggs invänta [granskning] / kö för moderation av kommentarer med en länk till deras website. Om du approve den får de en backlink och ett omnämnande från din site.

Detta ger spammaren ett incitament att skrapa din site och skicka trackbacks. Lyckligtvis kan du inaktivera trackbacks och pingbacks för att ge scrapers en anledning mindre att stjäla ditt content.

Disabling Trackbacks and Pingbacks in WordPress

För mer information, kontrollera vår guide om hur du inaktiverar trackbacks på alla framtida posts. Du kanske också gillar att lära dig hur du inaktiverar trackbacks och pings på befintliga WordPress posts.

Inaktivera WordPress REST API

Notis: Förutom trackbacks och pingbacks rekommenderar vi också att du inaktiverar WordPress REST API, eftersom det kan göra det lättare för spammare att skrapa ditt content.

Vi har en detaljerad guide om hur du kan inaktivera WordPress REST API.

Allt du behöver göra är att installera och aktivera det gratis WPCode plugin och använda deras färdiga snippet för att inaktivera REST API.

4. Block scraperns åtkomst till din website i WordPress

Ett sätt att hindra scrapers från att stjäla ditt content är att ta bort deras åtkomst till din website. Du kan göra detta manuellt genom att blockera deras IP-adress, men de flesta användare tycker att det är enklare att använda ett plugin för säkerhet, t.ex. en web application firewall.

Blockera skrapan med hjälp av ett tillägg för säkerhet (rekommenderas)

Att blockera scrapers manuellt är knepigt och mycket arbete. Speciellt eftersom många hackningsförsök och attacker görs med hjälp av ett brett utbud av slumpmässiga IP-adresser från hela världen. Det är nästan omöjligt att hålla jämna steg med alla dessa slumpmässiga IP-adresser.

Det är därför du behöver en Web Application Firewall (WAF) som Wordfence eller Securi. Dessa fungerar som en sköld mellan din webbplats och all inkommande trafik genom att övervaka trafiken på din webbplats och blockera vanliga säkerhetshot innan de når din WordPress site.

För WPBeginners website använder vi Sucuri. Det är en säkerhetstjänst för webbplatser som skyddar din webbplats mot sådana attacker med hjälp av en brandvägg för webbplatsapplikationer.

I princip går all trafik på din website genom säkerhetstjänstens servrar, där den undersöks för misstänkt aktivitet. De blockerar automatiskt misstänkta IP-adresser från att nå din website helt och hållet. Se hur Sucuri hjälpte oss att blockera 450 000 WordPress-attacker på 3 månader.

Blockera eller redirecta scraperns IP-adress manuellt

Avancerade användare kanske också vill blockera en skrapares IP-adress manuellt. Detta är mer arbete, men du kan specifikt rikta in dig på skraparens adress när du väl har lärt dig den. Webbutvecklaren Jeff Star föreslår detta tillvägagångssätt när han skriver om hur han hanterar content scrapers.

Note: Att lägga till kod i filer på en website kan vara farligt. Även ett litet misstag kan orsaka stora error på din site. Därför rekommenderar vi denna metod endast för avancerade användare.

Du kan hitta skraparens IP-adress genom att gå till “Raw Access Logs” i adminpanelen på ditt webbhotell. Du måste leta efter IP-adresser med ett ovanligt stort antal requests och göra en post av dem, till exempel genom att kopiera dem till en separat textfil.

Tips: Du måste se till att du inte blockerar dig själv, legitima användare eller search engines från att komma åt din website. Kopiera en IP-adress som ser misstänkt ut och använd onlineverktyg för IP-sökning för att få reda på mer om den.

När du är säker på att IP-adressen tillhör en scraper kan du blockera den med hjälp av cPanel-verktyget “IP Blocker” eller genom att lägga till kod som gillar det här i din .htaccess-fil:

1	`Deny from 123.456.789`

Se till att du ersätter IP-adressen i koden med den som du vill blockera. Du kan blockera flera IP-adresser genom att enter dem på samma rad, separerade med mellanslag.

Detaljerade instruktioner finns i vår guide om hur du blockerar IP-adresser i WordPress.

Istället för att helt enkelt blockera skraporna föreslår Jeff att du kan skicka dem dummy RSS-flöden istället. Du kan skapa feeds fulla av Lorem Ipsum och irriterande images eller till och med skicka dem direkt tillbaka till deras egen website, vilket orsakar en oändlig loop och kraschar deras server.

För att redirecta dem till en dummy feed måste du add to kod som gillar detta till din .htaccess-fil:

RewriteCond %{REMOTE_ADDR} 123\.456\.789\.
RewriteRule .* http://dummyfeed.com/feed [R,L]

5. Förhindra stöld av images i WordPress

Det är ej bara ditt skrivna content som du måste skydda. Du bör också förhindra stöld av images i WordPress.

Likes text, det finns inget sätt att helt stoppa folk från att stjäla dina images, men det finns många sätt att motverka bildstöld på en WordPress website.

Du kan t.ex. inaktivera hotlinking av dina WordPress images. Detta innebär att om någon skrapar ditt HTML-content, kommer deras images ej att hämtar på deras site.

Det kommer också att minska belastningen på din server och användningen av bandbredd, vilket ökar hastigheten och prestandan för WordPress.

Alternativt kan du add a watermark to your images that gives you credit. Detta gör det clear att scrapern har stulit your content.

Du kan lära dig dessa två tekniker, liksom andra sätt att skydda dina images, i vår guide om sätt att förhindra bildstöld i WordPress.

6. Avråda från manuell kopiering av Your Content

De flesta scrapers använder automatiska tools, men vissa content-tjuvar kan försöka kopiera all eller delar av your content manuellt.

Ett sätt att göra det svårare är att hindra dem från att copy and paste din text. Det kan du göra genom att göra det svårare för dem att välja texten på din website.

För att lära dig hur du stoppar manuell kopiering av ditt content, se vår Step-by-Step guide om hur du förhindrar textval och copy and paste i WordPress.

Detta kommer dock inte att skydda your content helt och hållet. Kom ihåg att tekniskt kunniga användare fortfarande kan viewa källkoden eller använda Inspect tool för att kopiera vad de vill. Dessutom fungerar den här metoden inte med alla web browsers.

Tänk också på att inte alla som kopierar din text kommer att vara en content-tjuv. Till exempel kanske vissa människor vill kopiera rubriken för att dela ditt post på sociala medier.

Det är därför vi rekommenderar att du bara använder den här metoden om du känner att det verkligen behövs för din site.

7. Dra nytta av content scrapers

När din blogg blir större är det nästan omöjligt att stoppa eller hålla track på all content scrapers. Vi skickar fortfarande ut DMCA-klagomål. Vi vet dock att det finns massor av andra webbplatser som stjäl vårt content som vi helt enkelt inte kan hålla jämna steg med.

Istället är vår strategi att försöka dra nytta av content scapers. Det är inte så illa när du ser att du tjänar pengar på ditt stulna content eller får mycket trafik från en scrapers website.

Gör intern linking till en vana för att få trafik och backlinks från scrapers

I vår ultimata guide om sökmotorsoptimering rekommenderar vi att du gör intern länkning till en vana. Genom att placera links till annat content i dina blogginlägg kan du öka antalet sidvisningar och minska studsfrekvensen på din egen site.

Men det finns en andra fördel när det gäller skrapning. Interna links kommer att ge dig värdefulla backlinks från de personer som stjäl ditt content. Sökmotorer gillar Google använder bakåtlänkar som en ranking signal, så de extra bakåtlänkar är bra för din sökmotorsoptimering.

Slutligen tillåter dessa interna länkar dig att stjäla skraparens publik. Talangfulla bloggare placerar länkar på intressanta keywords, vilket gör det frestande för användare att clicka. Visitors till scraperns website kommer också att click the links, which will lead them straight back to your own website.

Auto Link Keywords med Affiliate Links för att tjäna pengar på Scrapers

Om du tjänar pengar på din website genom affiliate-marknadsföring rekommenderar vi att du aktiverar automatisk länkning i dina RSS-flöden. Detta hjälper dig att maximera dina intäkter från läsare som bara läser din website via RSS-läsare.

Ännu bättre, det hjälper dig att tjäna pengar på de webbplatser som stjäl ditt content.

Använd helt enkelt ett WordPress plugin som gillar ThirstyAffiliates som automatiskt ersätter tilldelade keywords med affiliate links. Vi visar dig hur du gör i vår guide om hur du automatiskt länkar keywords med affiliate links i WordPress.

Främja din website i din RSS footer

Du kan använda All in One SEO plugin för att add to custom objekt till din RSS footer.

You can for example add a banner that promote your own products, services, or content.

Det bästa är att dessa banners också kommer att visas på scraperens website.

I vårt fall lägger vi alltid till en liten ansvarsfriskrivning längst ner på posterna i våra RSS-flöden. Genom att göra detta får vi en backlink till den ursprungliga artikeln från skraparens site.

Detta låter Google och andra search engines veta att vi är auktoriseringen. Det låter också deras användare veta att sajten stjäl vårt content.

För fler tips, kontrollera vår guide om hur du kontrollerar din RSS-flöde footer i WordPress.

Vi hoppas att denna handledning hjälpte dig att lära dig hur du förhindrar skrapning av blogginnehåll i WordPress. Du kanske också vill se vår ultimata säkerhetsguide för WordPress eller vårt expertval av de bästa plugins för innehållsskydd för WordPress.

If you liked this article, then please subscribe to our YouTube Channel for WordPress video tutorials. You can also find us on Twitter and Facebook.

Popular on WPBeginner Right Now!

Disclosure: Our content is reader-supported. This means if you click on some of our links, then we may earn a commission. See how WPBeginner is funded, why it matters, and how you can support us. Here's our editorial process.

About the Editorial Staff

Editorial Staff at WPBeginner is a team of WordPress experts led by Syed Balkhi with over 16 years of experience in WordPress, Web Hosting, eCommerce, SEO, and Marketing. Started in 2009, WPBeginner is now the largest free WordPress resource site in the industry and is often referred to as the Wikipedia for WordPress.

89 kommentarerLeave a Reply

Moinuddin Waheed

I have many friends who used to talk to me about using RSS feed and make content on their website this way. I was not aware exactly how it worked and what benefits they incurred by doing that.
Scraping others content and showing as if they are themselves have created is an offense but in unethical world who cares. Thanks for making this guide by following which we can prevent our content from scraping and atleast can turn it to our advantage.

Svara
Jiří Vaněk

Thank you for the article. I have a blog with over 1200 articles, and I need to start addressing that as well. Thanks for the valuable advice.

Svara
- WPBeginner Support
  
  You’re welcome!
  
  Svara
  
  Admin
Toheeb Temitope

Thanks for the post.
But can I even remove the or disable RSS feed totally or is there any special benefit in it.
Then if I want to disable RSS feed totally, how will I do it.
Thanks.

Svara
- WPBeginner Support
  
  If you want to disable the RSS feed for your site, our guide below would be helpful:
  
  https://www.wpbeginner.com/wp-tutorials/how-to-disable-rss-feeds-in-wordpress/
  
  RSS feeds can be helpful to certain users of your site who use RSS feed readers to know when a site has new content.
  
  Svara
  
  Admin
  - Moinuddin Waheed
    
    it is good idea to know that we can even disable the RSS feed thus by preventing the potential theft and scraping of the content.
    though disabling the RSS feed has some trade off as well.
    is there any seo disadvantage of disabling the RSS feed?
    or it has nothing to do with seo and ranking ?
    
    Svara
    - WPBeginner Support
      
      Your RSS feed should not affect your site’s SEO.
Giovanni

Thank you. Exactly the information I need. But do scrapers use RSS feed still in 2019?

Svara
- WPBeginner Support
  
  They certainly can and will try to
  
  Svara
  
  Admin
Nergis

We hear so much about getting site content by doing content curation. Is content scrapping the same as content curation? If not what’s the difference between the two?

Svara
- WPBeginner Support
  
  Content scraping is taking content from other sites to place on your site without permission, content curation is normally linking to other content within content you have created
  
  Svara
  
  Admin
Kingsley Felix

I am facing these issues, i had 20+ for one of our brands, then we moved elsewhere and they are back again.

Svara
- WPBeginner Support
  
  content scrapers are a constant strugle sadly
  
  Svara
  
  Admin
slevin smith

I found a realy bad content scaper from by blog, not only they steal my content, used the same name for they spam blog only separatedwith a – and all description, tag, basicly trying to be me, is used links in rssfeed with my blog, youtube channel, facebook, twitter, pinterest & google plus, which shows up on there spam blog, also found that png images shows up on the front page but jpeg dose not, but that maybe just on blogger.

Svara
astrid maria boshuisen

I absolutely love the interlinking-idea. Will have to look at the RSS suggestion, since I forgot how that works exactly, having focussed on writing Kindle e-books for a while (talk about content scraping – zero protection there!.. hence my return to website writing) but I feel I have really got a place to start with protecting my content! Thanks!

Svara
Danni Phillips

WOW! So much to take into consideration when starting a blog. My blog is only 2 weeks old. I have used mainly WP Beginner to set up my blog. So much good info set out in a way a newbie can follow.

I don’t know if this works for content scraping but I have installed a plugin called Copyright Proof. It disables right click so that people can not copy and paste your content.

I decided to use this plugin as it was a recommended plugin for author sites.

Svara
- Eri
  
  your post can be copied easy , trust me.
  
  Svara
- Reo
  
  Disabling selection is good method but it only support famous web browser like Chrome, Safari and Opera but not IE and Edge.
  
  Svara
Dave Coldwell

Another great article, I work as a freelance journalist so I sell a lot of articles and it’s up to the people who buy it to decide on their policies.
But I also have a couple of blogs and affiliate websites so I think I might need to take a look at what’s happening with my content.

Svara
Absynth

Does not giving credit where it’s due count as “content scraping”?

Because Jeff Starr wrote this same post at Perishable Press over 5 years ago:

Check the structure and terminology of your article and compare it to the original.

Just sayin.

Svara
- WPBeginner Support
  
  We did give credit to Jeff Starr. Please read the actual article before pointing out errors.
  
  Svara
  
  Admin
  - Absynth
    
    Yes my apologies.. I missed that the first time through. My bad
    
    Svara
Sieu

i has just develop a theme for blogger and that theme need a full feed to work, i worry about scrapping content, i think if many scrapper use my content on their blogger site, which have the same content with my site, backlink point to mysite, my blog will be spam in Google ‘s eye and will be deleted.

Svara
Lori

Thanks for this amazing article with useful tips! I actually just got a “Thin Content” penalty from Google. I asked an SEO expert for help, they told me to stop scraping content. They sent me a link of an article I wrote yesterday and thought I had stolen it from another website. The crappy thing is, they were stealing from me, not just that article, but probably a couple thousand articles! They are still in Google search, and I am not. I am being the one penalized! Turns out there are at least three websites scraping my content, not even sure what to do.

Svara
Raviraj

Awesome article.

I sort of agree with most of the points you have discussed. Actually few of the points are pretty awesome.

But if your sole business is based on content in your website, shouldn’t we be more careful about scrapers?

I don’t think content theft would ever be good to the owner of the content.

I guess we all should think of opting some preventive measure rather than reactive measure. You can consider using ShieldSquare, a content protection solution to stop content scraping permanently.

Svara
Andre

I know this is an old article, but the one source that is NOTORIOUS for allowing content scaping is WordPress with their “Press This” feature. They are basically encouraging this.

Svara
Sara

I think I may have finally found the answer to my problem. I have been thinking someone has been stealing my stories and making them into “new” stories. I thought either someone is out to get me or I am losing my mind. I was almost losing my mind over thinking like this. Paranoid. Concerned someone was listening to my private phone calls. When really, all the information has come directly from my blog! This article may have saved my life. Literally. I am not even joking because I have been so afraid that I was going crazy and very selectively trying to talk about it with friends, to get feedback or support and being looked at like I am nuts and need to go to the psych ward for a while. This article makes what has been happening to me, make total sense. Thank you! I am so overwhelmed with relief.

Svara
John

Thanks for some tips but a good chunk of this article is not very helpful. Most scrappers are not blind scrappers, the content is generally sucked, looked at by a human eye and then published. Which means that even by taking a minute to look at an article the spam kid is able to publish hundred of copied article a day. Backlinks problem is very easy to circumvent for content scrapper as the feed importers have pre-process options and they generally set it to delink the body. Also I do not see how turning rss into summary may help at all, the feed importers only use the rss to grab the new content link and from there they follow the skeleton of your html, which you have nicely set with proper image, title, link etc tags for the convenience of Google and very easily extract the content.

Obviously blocking the IP is a very good solution. DMCAs are generally a waste of time; they take time to formulate and stupid hosts take time to respond (since spammers choose these host specifically because they’re lax on spam-like activity). Of all, Google is the most frustrating; no matter how many reports you file with them they never take action on any of the stolen content on which they’re showing ads and still rank the crap-spam site well on the search results despite it being easy for their systems to detect copies

Svara
- Evie
  
  John, I couldn’t agree with you more. Google got mad at me stating that I was the person stealing my own content. This person stole my content and put it on blogger. The nerve. There needs to be a solution for this. At this point, I just block!
  
  Svara
WPBeginner Staff

Then perhaps the best way for you is to change the licensing and aggressively send take down notices to content scrappers. Meanwhile keep focusing on creating quality content.

Svara
Philipp D

Hi there,
I just stumbled upon your article while looking for answers to some of my concerns.
I, together with some friends, launched a website about DIY in Italy, few months ago, which is working unexpectedly well, rankings are high, lots of traffic, etc. Still, PR is yet 0. Our content has a Creative Commons 4.0 license, because we realyl believe it’s a good way to share contents. HOWEVER:
Some time ago we noticed a PR4 site with lots of traffic copying our top articles, linking back to our homepage (which is not what you’re supposed to do with a CC license, but it’s still ok). The problems are these:
1. there’s a whole lot of smaller sites scraping their (our) content and linking back to them instead of our site
2. the PR4 site and some of the smaller sites somehow rank better than our site
3. there’s strong suggestions that a Google penalty to OUR content has taken place, as it has lower PR than most of the other pages (which have been online for a long time).

We’re in contact with the PR4 site and it’s ok for us if they use our content, as long as they link back to the original article (that’s the whole point of the CC license), BUT we’re trying to find a solution to avoid getting Google penalties: would rel canonical do the job? What is your opinion? Whould we change our license and be more aggressive towards content copying?
Thank you!

Svara
- WPBeginner Support
  
  Philipp, If you have not already done so, then you should create a webmaster tools account for your site and submit your sitemap. It helps you figure out if there is a problem with your site, how your site is doing on search, and you can use lots of other tools. It also helps Google better understand where some content first appeared.
  
  We don’t think changing the license will stop content scrappers from copying your content.
  
  Svara
  
  Admin
  - Philipp
    
    hi! Yes, we set up a webmaster tools account, linked the site to our google+ page, and most of the authors to their google+ profiles using publisher and author tags. authorship seems to be working fine in search snippets, but so far it doesn’t seem to make much difference in case of scraped content. Higher PR pages scraping our content are still on top…
    
    Svara
Garratt

One of the best ways not to be effected by this is to ping effectively. Pinging, and manually submitting pages to Google and Bing gets spiders on your site FAST. They index the pages ASAP, then when they find duplicate content on other sites consider you as the authority.

I do however have the sneaky suspicion this might have to do with PageRank though… But Matt Cutts (webspam team @ Google) has advocated using pinger’s on this very topic. I’m just not sure how much I can trust what he says though.

To add more services, go to Settings -> Writing Settings -> Update Services -> Open the “Update services” link in a new tab and copy all the update services. Back in WordPress paste them in the ping list and click save.

Open account in Bing Webmaster tools for manual URL submission for fast indexing.

Svara
Chris Backe

I recently discovered a guy that can taking an RSS feed from my blog – bear in mind that my blog is a summary feed with Yoast’s ‘This post was found first on’ line. I sent the guy a thank-you message, basically telling him that he’s giving me backlinks, AND telling Google he’s copying my website (since they can look at the timestamps to see which was published first).

Checked out 2 days later, and all my stuff was mysteriously gone…

Svara
- Editorial Staff
  
  Hah yup. Most of these scammers aren’t very bright lol. Glad you got it fixed.
  
  -Syed
  
  Svara
  
  Admin
Ian

Has anyone seen or used this WP anti scraping plugin http://wordpress.org/plugins/wordpress-data-guards/ it sounds solid but very few people have downloaded it ? I’m not technical – so would appreciate opinions on its worth or effect on SEO

Svara
- Editorial Staff
  
  You can definitely use that plugin. It blocks right clicks, keyboard shortcuts for copying, ip blacklist etc. Those all prevent manual scraping however most content scrapers use automatic tools. So none of those would be super helpful.
  
  Svara
  
  Admin
- Ian
  
  Thanks for your reply – the pro version states it protects you from bot attacks so I assume that means scrapper bots? the price puts me off installing it on all my sites, but I may use it on one just to see how well it works
  
  Svara
Mark Conger

This is one of, if not the best, “beginner” article I’ve ever come across on the web.

After reading it I feel like I just had a meeting with a security consultant.

I’m applying these techniques right frickin now!

Thanks. I’m now a follower of this site.

Svara
- Editorial Staff
  
  Thanks for the very kind words Mark
  
  Svara
  
  Admin
Neil Ferree

Its only happened to me a few times. Some blogger from outside the USA has taken my post word-for-word and posted to their site as if it were their own. Since it was just a single post with my YT video embedded, I didn’t sweat the details too much, since my channel CTR saw a nice spike it visits anyway.

Svara
Edward B. Rockower, Ph.D.

Just want to say thanks, thanks, and thanks!

I just today discovered your website, only read 3 articles so far (including this one)… but I’m extremely impressed.

I’ve only been blogging now for 5 weeks, but finding it addictive, especially seeing the growing traffic and user engagement as a result of my efforts. Seeing 100 visitors to my blog site in one day, and being able to see who’s referring them, motivates me to learn all I can to increase the social media marketing and interactions with new visitors.

Best regards,
@earthlingEd

Svara
Debbie Gilbert

I love your Website and was floored to read about content scraping! Is there and way to create a watermark somehow which is not distracting to your readers but to the scraper’s site is dead obvious?

Svara
- Editorial Staff
  
  You can do hotlink protection among other things to disable images on domains that are not whitelisted.
  
  Svara
  
  Admin
Usman

Is it legal to post the complete article from another website and writing source website name at bottom of article?

Svara
- Editorial Staff
  
  No.
  
  Svara
  
  Admin
  - Usman
    
    And if we give direct link to article at bottom?
    
    Svara
    - Dan
      
      It is still not good unless the owner approves it
Abdul Karim

Is there any way / plugin

someone is copy my fashion blog picture and post it at their forum

but when i click on image at that forum . its open in new window

i want any plugin or script that if he copy my images when someone click on that images, then that person redirect to my blog post related to that images ?

any plugin yet ? link with post images ?

Svara
- Editorial Staff
  
  None that we know of.
  
  Svara
  
  Admin
  - Abdul Karim
    
    I’ll done it just change
    
    when someone upload any picture on right side it shows url link
    
    default setting is media file
    u have to change it in attachment url
    
    then done!
    
    when someone copy your blog images .that give backlink to your posted page
    
    Svara
Anton

If someone takes an article written in English and translate it, using their heads and not google translate, into some other language, lets say because the majority of the people in the country of that other language doesn’t understand English. Would you point them out as scrapers anyway? Or what is your opinion on that?
For me personally I don’t find it extremely problematic, of course I believe the “author” should link bank to the original article while clarifying that his article is translated.

Svara
- Editorial Staff
  
  Unless you have written permission of the author, then it is technically scraping.
  
  Svara
  
  Admin
Greg

This is a tremendous article. After reading it I hope you do not see me as a content scraper. I have used excepts from you (curated), I always have the ‘Read the Full Article” and have your page link there and also many of my posts are tweeted and I include your twitter account in there. If you do not want this please let me know and I will gladly remove it. I am very appreciative of your work and want to share it with my visitors. it is not intended to steal your visitors but to be able to give good value to mine and send them on to you for more.

Svara
- Editorial Staff
  
  Greg, as long as you only display an excerpt and send the user over to our site to read the full article, then it is not scraping. As you said, it is curation. Tons of popular sites do that (i.e reddit, digg, etc).
  
  Svara
  
  Admin
ryan

My site has a lot of original security articles and a couple have been scraped. The site that scraped me was in yahoo! News with my article and had people commenting on it. I dealt with the issue by commenting and saying I was the original author and replied to a few comments. I had internal links, that’s how I found out so quickly. A trick I am going to write about is getting people who come from a scrapers site and have a banner or image appear telling them what happened. The never ending request suggestion sounds illegal under the computer fraud and abuse act. I am not a lawyer. I just write about security, so I have to know the security laws for computers.

I Do not like it that your form didn’t take my companies email as a valid email.

Svara
- Editorial Staff
  
  Sorry Ryan that our form didn’t approve your business email. Not sure what happened there, but it is meant to approve all valid emails.
  
  Svara
  
  Admin
andre

how to use this code, can you provide more details or tutorials, thank you

RewriteCond %{REMOTE_ADDR} 123\.456\.789\.
RewriteRule .* http://dummyfeed.com/feed [R,L]

Svara
- Editorial Staff
  
  You would have to edit the .htaccess file.
  
  Svara
  
  Admin
Ali Rashid

nice and informative writeup i like your approach of taking advantage of the scrappers however blocking an ip may not always work; a serious scrapper would often use a list of anonymous or free proxies in that case blacklisting one ip might not be an effective solution as the scrapper would change it often. One solution is to write a small script that will detect any abnormal traffic from a given ip, say more than 20 hits/sec and challenge it with a captcha if no reply, put the ip in a temp blacklist for about 30 mins. you can hardened it with another javascript that detects mouse, touch or keyboard movement after few hits, if no keyboard, mouse, or touch is detected you can again put the scrapper in the temp blacklist, worked like a charm for us.

Svara
Arihant

Your solutions are good enough for content scrapers.
But what if people are manually coping and pasting content into their Facebook pages.
We have implemented tynt but they remove the link back to original article, any ideas on how you can handle this kind of situation.

Svara
- Editorial Staff
  
  If people really want to steal your content, there is nothing you can do about it. It’s a sad truth, but it’s a truth.
  
  Svara
  
  Admin
  - Garratt
    
    Actually there’s a plugin created by IMWealth Builders, probably the only one of their plugins I like, the rest are pretty trashy and involve scraping Ecommerce sites (CB,Azon,CJ etc) for affiliate commisions.
    
    It’s called “Covert Copy Traffic” is actually allows you to set any text pre or post a set number of words. So say I set it to post “This content was taken from xxxxxxx.com” after 18 words. Then anytime someone copied/paste more than 18 words from the website it would add that text at the bottom, 17 words or less it would do nothing.
    
    These were just example settings. Pretty useful plugin, works a charm. I’ve tried just about every way I could think of to bypass the text insertion but it seems to be impossible. Plugin is to stronk.
    
    Svara
    - Editorial Staff
      
      Sounds like you are describing this tutorial here:
      
      https://www.wpbeginner.com/wp-tutorials/how-to-add-a-read-more-link-to-copied-text-in-wordpress/
    - Garratt
      
      Yeah, that’s right. You can just use that script to say “Content came from yourwebsite.com” rather than “Read More”.
    - Jennae Barker
      
      Is this true that their amazon etc programs are scrapers – if that is the case – I have made whopper of mistake on a purchase from them – luckily, I have not used it yet.
    - Garratt
      
      Yeah Jennae, it’s legal in terms of Amazon allow you to copy content from their pages. It helps there sales, affiliates are the reason Amazon is Amazon.
      
      However Google and other search engines (that matter) just consider it a “thin affiliate site” as in no original content. Therefore they don’t rank unless there’s a certain percentage of original content on the site as well.
      
      A scraper, is nothing more than a spider/crawler generally it runs in socket mode, however some run in browser.
      
      Just because it’s labeled as a scraper doesn’t make it bad per say, I use scrapers and spiders regularly to check my site for unnatural links, I check others for competition analysis, and keyword research and a variety of other tasks that do not harm anyone, but benefit me.
      
      However I don’t like or condone anyone scraping for the purpose of copyright infringement. Which is what this discussion is really about.
      
      Google uses the spider “Google Bot” to index the web along with 100’s of other search engines, there’s thousands, hundreds of thousands of spiders crawling the web for a variety of purposes. Google also scrapes websites to “cache” them. As do a lot of important services we need such as the historical web archives.
Troy

I’m about to begin aggressively searching for sites that are copying my content and have the content removed. I no it is impacting how my site ranks so I have to do something about it. Any idea how much has to be copied before you can deliver DMCA notices? Is a paragraph in an article enough to legally be able to call it plagiarized?

Svara
- Editorial Staff
  
  We are not legal experts here, so we refrain from giving legal advice on this site.
  
  Svara
  
  Admin
Dallas

You fail to mention that any self respecting autoblogger will strip out links and insert their own affiliate links rather than using your content as it comes, so your approach to getting links from them will usually fail.

Svara
- Editorial Staff
  
  Is there such thing as a self respecting autoblogger? If they have any self respect, then they will write original content.
  
  Svara
  
  Admin
  - David Halver
    
    Agreed! There’s a very special “Hot Place” near the center of the Earth for Spammers, Scrapers and Auto-Bloggers…
    
    Svara
VeryCreative

I think that the best idea is to include affiliate links.
After the last Pinguin update, my website was penalized. I started to analyze it and I’ve discovered that many other sites copied my content. I don’t know why, but those websites rank better than me in search engines, using my content.

Svara
- Editorial Staff
  
  Not just affiliate links. Include as many internal links. Because if those sites are linking back to your other pages, then Google will KNOW that you are the authority site.
  
  Svara
  
  Admin
  - Bayer
    
    Hi wpbeginner.com Team. I really appreciate this article, but have one question in regards to having internal links in your pages/posts.
    
    I suppose you mean ‘absolute’ links?? Otherwise this may not work in your favour, once the content has been scraped… Well, so far I have always been going along with relative links, as you do I suppose. Which is the best method? Cheers!
    
    Svara
    - Editorial Staff
      
      We always use absolute links because it keeps thing working smooth.
Gautam Doddamani

first of all your tutorial is just fantastic..hats off! just one doubt how to know if a site is a scraper site? i used your method and found out that Google Webmaster Tools is reporting that there are 262 links to my site and there are many sites which dont know of…thus i am in a confusion….how to check if a site is a scraper site or an authoritative site?? is der a tool available for that? thanks in advance!

Svara
- Editorial Staff
  
  Trust me, no authority site will ever STEAL your article word-for-word.
  
  Svara
  
  Admin
  - Gautam Doddamani
    
    yes that is true…but what if i dont want to find my article on those scraping sites…i know my article is there as it is being reported by GWT and i just want to block that IP address by inserting those rewritecond rules in the htaccess file…i dont want to waste my time searching those bad sites for my article or requesting them to takedown my article…
    
    Svara
Nathan

Thank you for this article – and for your site in general!. I like this so much that I had wondered how I would keep track of this resource. And now I see the subscriptions options below. What a way to get a comment!

Svara
Yeasin

Preventing content scraping is almost impossible. I don’t think content scrapper does hurt me any way. They are just voting me that i have got some high quality contents. Google is smart enough to detect the original publishers. No-one should worry.

Svara
mrwindowsx

really informative, if you use cloudflare, there is new apps called ScrapeShield, and you can easily protect and track/monitor your site contents free.

Svara
- wpbeginner
  
  @mrwindowsx Oh didn’t know that. Thanks for pointing it out.
  
  Svara
- Gautam Doddamani
  
  wow dats great man…do you use cloudflare? i just wanted your review because i have never used that cdn service..i know it is free and all but i think my site load time is already gr8 that i didnt require it…now that scrapeshield is there i think i will definitely check it out…what all other apps will we get if we start using cloudflare?? thanks
  
  Svara
  - Matt
    
    Hello,
    IMO @cloudflare really is awesome. I have two sites on it (both mine and my wife’s blog) and it really is incredibly fast, but that’s not to mention all of the security, traffic analysis, app support (automatic app installs) that they provide.
    
    I know that all hosting setups are different, but I have both of our sites running on the Media Temple (gs)Grid Service. I can honestly say that our sites run faster now than they did when I was using W3 Total Cache and Amazon S3 as my CDN. Actually, I still use W3TC on my site to minimize & cache my content, but I use CloudFlare for CDN, DNS, and security services.
    
    Highly recommend… Actually, I would really appreciate it if someone at WPBeginner would give us their in-depth, experienced opinion of the CloudFlare services. To me, they have been awesome!
    
    Svara
shivabeach

You can also get a plugin whose name eludes me at this time that does the google search for you. It also adds a code into your RSS that the app searches for

Svara
MuhammadWaqas

Great post, I know there are many autoblogs fetching my content. Although after penguin update my site is getting 3 times more traffic from google than before. But after reading about many disasters or original content generators I’m worried about future penalties by google.

Its my experience that usually google respect high PR sites with good authority backlinks. but site is just one year old and PR is less than 5.

I try to contact scrappers but most of them don’t have contact forms. so I think I’ll try that htaccess method to blog the scrappers ip addresses. But only the other hand some of them can use feedburner.

Svara
- Garratt
  
  Personally I don’t bother with RSS as most users don’t use it. Instead supply a newsletter feed. It does the same trick + you get emails to market to (if done correctly). Majority of people are more likely to subscribe to a blog rather than bookmark a RSS in my experience. So it’s better to turn off RSS. You can do this using WordPress SEO by Yoast, and various other plugins.
  
  Then if you also implement above mentioned strategies, you should be good. Remove all unnecessary headers RSD WLM etc.
  
  There will be a couple still able to scrape effectively but those tricks will diminish a great deal of them.
  
  Svara